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Abstract. We describe decomposition during search (DDS), an integra- 
tion of And/Or tree search into propagation-based constraint solvers. 
The presented search algorithm dynamically decomposes sub-problems 
of a constraint satisfaction problem into independent partial problems, 
avoiding redundant work. 

The paper discusses how DDS interacts with key features that make 
propagation-based solvers successful: constraint propagation, especially 
for global constraints, and dynamic search heuristics. 
We have implemented DDS for the Gecode constraint programming li- 
brary. Two applications, solution counting in graph coloring and protein 
structure prediction, exemplify the benefits of DDS in practice. 



1 Introduction 

Propagation-based constraint solvers owe much of their success to a best-of- 
several-worlds approach: They combine classic AI search methods with advanced 
implementation techniques from the Programming Languages community and 
efficient algorithms from Operations Research. Furthermore, the CP community 
has developed a great number of propagation algorithms for global constraints. 

In this paper, we present how to integrate And/Or search into propagation- 
based constraint solvers. We call the integration decomposition during search 
(DDS). We take full advantage of all the features mentioned above that make 
propagation-based constraint solvers successful. The most interesting points, and 
main contributions of our paper, are how DDS interacts with and benefits from 
constraint propagation, especially in the presence of global constraints, and dy- 
namic search heuristics. We exemplify the profit of DDS by exhaustive solution 
counting, an important application area of decomposing search strategics [3, 8]. 

Related work. Only recently, counting and exhaustive enumeration of solu- 
tions of a constraint satisfaction problem (CSP) gained a lot of interest [1,3, 
8,21]. In general, the counting of CSP solutions is in the complexity class #P, 
i.e. it is even harder than deciding satisfiability [19]. This class was defined by 
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Valiant [24] as the class of counting problems that can be computed in nondeter- 
ministic polynomial time. Notwithstanding the complexity, there is demand for 
solution counting in real applications. For instance, in bioinformatics counting 
optimal protein structures is of high importance for the study of protein energy 
landscapes, kinetics, and protein evolution [20, 25] and can be done using CP [2]. 

Already folklore, standard solving methods for CSPs like Depth-First Search 
(DFS) leave room for saving redundant work, in particular when counting all 
solutions [10]. Recent work by Dechter et al. [8, 15] introduced And/Or search for 
solution counting and optimization, which makes use of repeated anrf-decomposi- 
tion during the search following a pseudo-tree. Their work thoroughly studies 
and develops a rich theory of And/Or trees. 

While not in the context of general constraint propagation, similar ideas were 
discussed before for SAT-solving [3,5,14]. The SAT approaches also introduce 
the idea of analyzing the induced dependency structure dynamically during the 
search. This avoids redundancy that occurs due to the emergence of independent 
connected components in the dependency graph during the search. 

Motivation and contribution. The motivation for this paper is to tackle the 
same kind of redundancy for solving very hard real world problems, such as the 
counting of protein structures, that require a full-fledged constraint program- 
ming system. This requests for a method which is tailored for integration into 
modern CP systems and directly supports features such as global constraints and 
dynamic search heuristics. To make use of the statically unpredictable effects of 
constraint propagation and entailment, the presented method avoids redundant 
search dynamically. 

Our main contribution is to present how to integrate And/Or tree search into 
a state-of-the-art, propagation-based constraint solver. This is exemplified by ex- 
tending the Gccodc system [11]. We describe decomposition during search (DDS) 
on different levels of abstraction, down to concrete implementation details. 

In detail, we stress the impact that constraint propagation has on decompos- 
ability of the constraint graph, and how DDS interacts (and works seamlessly 
together) with propagators for global constraints, the workhorses of modern 
propagation-based solvers. We show that global constraint decomposition is the 
key to enable the application of DDS, and discuss techniques that enable global 
constraint decomposition. The practical value of DDS in the presence of global 
constraints is shown empirically, using a well integrated and competitive imple- 
mentation for the Gecode library. 

Overview. The paper starts with a presentation of the notations and concepts 
that are used throughout the later sections. In Sec. 3, we briefly recapitulate 
And/Or search, and then present, on a high level of abstraction, decomposition 
during search (DDS), our integration of And/Or search into a propagation-based 
constraint solver. Sec. 4 deals with the interaction of DDS with propagation and 
search heuristics. Section 5 discusses how global constraints interact with DDS, 
focusing on decomposition strategies for some important representatives. 

On a lower level of abstraction. Sec. 6 sketches the concrete implementation 
of DDS using the Gecode C++ constraint programming library. With the help 
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of our Gecode implementation, we study the practical impact of DDS in Sec. 7 
by counting solutions for random instances of two important CSPs, graph color- 
ing and protein structure prediction. Both examples are hard counting problems 
(in class #P). The study shows high average speedups and reductions in search 
tree size, even using our prototype implementation. These two sections therefore 
provide evidence that DDS can be integrated into a modern constraint program- 
ming system in a straightforward and efficient way. The paper finishes with a 
summary and an outlook on future work in Sec. 8. 

2 Preliminaries 

This section defines the central notions that we want to use to talk about con- 
straint satisfaction problems. 

A Constraint Satisfaction Problem (CSP) is a triple P = [X,'D^C)^ where 
X = {xi, . . . , Xm} is a finite set of variables, T) a function of variables to their 
associated value domains, and C a set of constraints. An n-ary constraint c S C is 
defined by the tuple of its n variables vars(c) and a set of n-tuples of the allowed 
value combinations. We feel free to interpret vars(c) as the set of variables of c. 
A domain T) entails a constraint c if and only if all possible value combinations 
of the variable domains of vars(c) in T) are allowed tuples for c. A solution of a 
CSP is an assignment of one value v G T^ixi) to each variable Xi £ X such that 
all c G C are entailed. The set of solutions of a CSP P is denoted by sol(P). 

Based on these definitions some important properties of a CSP can be defined. 
A CSP P= {X,V,C) is solved if and only if^x e X : \V{x)\ = 1 and sol(P) 0. 
P is failed if and only if sol(P) = and satisfiable otherwise. A CSP P' is stronger 
than P (P' C P) if and only if sol(P') = sol(P) and Vx e X' : V'{x) C V{x). 

The constraint graph of a CSP P = {X, P, C) is a hypergraph Gp = (V, E), 
where V = X and E = {vars(c) | c G C}. 

3 Decomposition During Search 

In this section, we recapitulate And/Or tree search. Then, we present a high- 
level model of how to integrate And/Or tree search into a propagation-based 
constraint solver. We call this integration decomposition during search (DDS). 

3.1 And/Or tree search 

Let us look at an example to get an intuition for And/Or search. Assume 
P = {X,V,C) with X = {A,B,C,D}, V{A) = {3,5}, V{B) = {3,4}, V{C) = 
'D{D) ~ {1,2}, and C = 'A,B,C,D are pairwise different'. Figure 1 presents 
a corresponding search tree for a plain depth-first tree search. Each node is a 
propagated sub-problem of P and is visualized as a constraint graph. As usual, 
a node is equivalent to the disjunction of all its children. 

Even this tiny example demonstrates that plain DFS may perform redundant 
work: The partial problem on the variables C and D is solved redundantly for 
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Fig. 1. DFS search tree. 




Fig. 2. And/or search tree. 



each solution of the partial problem on A and B. We say that {A, B} and {C, D} 
are independent sets of variables. 

The central idea of And/Or tree search [8] is to detect independent partial 
problems during search, to enumerate partial solutions of the partial problems in- 
dependently, and finally to combine them to solutions, or to compute the number 
of solutions. That way, each independent partial problem is searched only once. 
The name And/Or search stresses that the search tree contains both disjunctive, 
choice nodes (OR) and conjunctive nodes (AND), representing decompositions. 
Figure 2 shows a search tree for the same CSP as in Figure 1, but using And/Or 
search. For now, you can read the big X as "combine". Here, the search tree 
contains only one decomposition and two choices, instead of five choices in Fig- 
ure 1. In general, the amount of redundant work can be exponential in the size 
of the CSP. 



3.2 Integrating And/Or search into a propagation-based solver 

From a bird's eye view, And/Or search can be done easily in the context of 
propagation-based constraint solving. Algorithm 1 counts all solutions of a CSP P, 
decomposing the problem where possible. 

Ignoring line 5 for a moment, the algorithm runs a standard depth- first search 
(DFS). The function Propagate in line 2 performs constraint propagation: it 
maps a CSP P = {X,V,C) to a CSP P' = {X,V',C') such that P' C P. 
Propagate may remove entailed constraints from C. If P' is failed or solved, we 
just return that we found no resp. one solution (lines 3,4). Otherwise (line 6), we 
split the problem into LeftChoice(P') and RightChoice(P'). These functions 
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implement the search heuristic and wiU be discussed in more detail in Sec. 4.2. 
As the branches correspond to a disjunction of P' , the recursive counts add up 
to the total number of solutions. 



Algorithm 1 Counting by Decomposition During Search 

1: function DDS(P) 

2: P' <— Propagate(P) 

3: if IsFailed(P') return 

4: if IsSolved(P') return 1 

5: if Decompose(P')= (P^.Pj) return DDS(Pi) ■ DDSCPg) 

[> decomposition 

6: return DDS(LeftChoice(P')) + DDS(RightChoice(P')) 

I> clioicc 

7: end function 



The only addition that is necessary to turn DFS into an And/Or search 
is line 5: If the problem can be decomposed into P{ and P2 (we simplify by 
assuming only binary decomposition), these partial problems in conjunction are 
equivalent to P'. Hence, we multiply the results of the recursive calls. 

In contrast to investigating decomposability only on the initial CSP for a 
static variable selection, Algorithm 1 follows a dynamic approach: the check for 
decomposability is interleaved with propagation and normal search. As search 
progresses, more and more variables are assigned to values, and more and more 
constraints are detected to be entailed. We shall see that this greatly increases the 
potential for decomposition in Sec. 4. Furthermore, decomposition is completely 
independent of the implementation of LeftChoice and RightChoice, so any 
search heuristic can be used. 

Short-circuit. As a straightforward optimization, we can employ short-circuit 
reasoning in line 5. If DDS(P{) returns no solutions, we do not have to compute 
DDS(P2) a-t all. Note the potential pitfall here: There are situations where DFS 
detects failure easily, but DDS has to search a huge partial problem P{ before 
detecting failure in Pj- We come back to this in Sec. 4.2. 

Enumerating solutions. Extending DDS to enumeration of solutions is straight- 
forward: We just have to return an empty list in case of failure (line 3), a single- 
ton list with a solution when we find one (line 4), and interpret addition as list 
concatenation and multiplication as combination of partial solutions. Instead of 
enumeration, we can also build up a tree-shaped compact representation of the 
solution space (as in Fig. 2 and later in Fig. 9c). 

In the rest of this section, we show how to compute the Decompose function 
efficiently. 

3.3 Computing the decomposition 

We will now define formally when a CSP can be decomposed, and give a sufficient 
algorithmic characterization that leads to an efficient implementation. 
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Restriction and independence. The restriction of a function f : y ^ Z to 
a set X y is defined as 

fix : X ^ Z, x^ f{x). 

Wc define the restriction of a CSP P = {X, V, C) to a set of variables X' C X 
hyPix' = {X',V\x',C\x'). where 

C\x' = {c e C I vars(c) C X']. 

A non-empty proper subset X <Z X is independent in a CSP P = (A',I?,C), 
if and only if 

sol(P) = or sol(P)|^ ^ so\{P^x)- 

For X independent in P, we say that is a partial problem of P. We can 
decompose P if it has a partial problem. 

The key to an efficient implementation of Decompose is to have an algorith- 
mic interpretation of what it means that a CSP can be decomposed into partial 
problems. We now show that connected components in the constraint graph of 
a CSP represent independent partial problems. 

A graph is connected if and only if there exists a path between all nodes. A 
connected component of a constraint graph Gp is a maximal connected subgraph. 

Proposition 1 Consider a CSP P = (A",!?, C) with constraint graph Gp. If 
X <Z X is a connected component in Gp, then P^^ is a partial problem of P. 

Proof. There exists no hyperedge between node x d X and node y ^ X, as 
connected components are maximal. This means that there is no constraint be- 
tween any x and y in P. We have to distinguish two cases: If P is unsatisfiable, 
X is trivially independent (by definition of independence). Otherwise, take an 
arbitrary solution J G sol(i-|^), and an arbitrary solution s G sol(P). Merg- 
ing s into s yields s' = (x s{x) for x £ X, x i-^ s{x) otherwise). This s' 
is again a solution of P, as all constraints on X \ X arc still satisfied, and all 
constraints on X are satisfied, too. As we picked s and s arbitrarily, we get 
sol(P)|^ D sol(i-|^). Because of P|^ covers all constraints of C restricting 

X, it follows sol(P)|_;f C sol(i^^). 
Therefore, it holds sol(P)|;^; = sol(P|;f ). 

This result is not new [3, 10], but we repeat it to illustrate the central algo- 
rithmic idea. Connected components can be computed in linear time in the size 
of the graph, and incremental algorithms are available. We can thus implement 
Decompose as a simple connectedness algorithm on the constraint graph that 
yields all partial problems of the current CSP. 

Finding more than one non-empty connected component is a sufficient con- 
dition for finding partial problems, but not a necessary one. As an example, 
consider the CSP that contains the trivial constraint allowing all combinations 
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of values for x and y. Then x and y may still be independent, but the constraint 
graph shows a hyperedge connecting the two variables, so that x and y will al- 
ways end up in the same connected component. In the following section, we will 
see how propagation-based solvers can deal with this. 

4 How DDS Interacts With Propagation and Search 

The previous section showed how DDS can be integrated into a propagation- 
based solver. But what are the consequences, how is decomposition affected by 
propagation and search, and how can it benefit from the search heuristic? 

4.1 Constraint graph dynamics 

Decomposition examines the constraint graph during search. This is vital as 
propagation and search modify the constraint graph dynamically - they narrow 
the domains of the problem's variables and remove some entailed constraints. 
The result is a sparser constraint graph with more potential for decomposition: 

Assignment. Clearly, an assigned variable = 1) is independent of all 

other variables of the CSP. This implies that connections of hyperedges to as- 
signed variables can be removed from the constraint graph - the constraint 
graph becomes sparser. Assignment increases the potential for decomposition, 
since an assigned variable may have been responsible for keeping two otherwise 
independent parts of the graph connected. 

Entailment. Consider the example CSP P = {X,V,C) from the end of the 
previous section, where variables x G X and y Cz X are connected by a trivial 
constraint c e C allowing all possible value tuples. Obviously, c is entailed in P, 
it will not contribute to propagation any more. Formally, we have for P' = 
(A", I?, C \ {c}) that sol(P') — sol(P). It is also obvious that the constraint graph 
for P' is sparser than the one for P, it contains one edge less. It is thus vital 
to our approach to detect entailment of constraints as early as possible, and to 
remove entailed constraints from C. Clearly, full entailment detection is coNP- 
complete. Most CP systems (e.g. Gecode) however implement a weak form of 
entailment detection in order to remove propagators early, which our approach 
automatically benefits from. 

4.2 Search heuristics 

The applied search heuristic, encoded by LeftChoice and RightChoice, is ex- 
tremely important for the efficiency of the search. In particular, dynamic heuris- 
tics, natively supported by DDS, are known to be largely superior to static ones. 

In the following (and for our implementation) we refer to the common variable- 
value heuristics that select a variable x and a value v G I?(x). Afterwards, the dis- 
junction is done by LeftChoice(P) = {X,V,C^ixov)) and RightChoice(P) = 
{X, T>,C Li -i(a; o v)), where o is some binary relation. 
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Variable selection. The variable selection strategy of such a heuristic is cru- 
cial for the size of the search tree and often problem specific. Nevertheless, a 
common method for variable selection is 'first-fail', which selects by minimal 
domain size. Other strategies use the degree in the constraint graph or the mini- 
mal/maximal/median value in the domain. Static variable orderings are in many 
cases inferior to these dynamic strategies. To gain the best search performance 
by DDS, the variable selection further has to induce constraint graph decompo- 
sitions as early as possible. 

Heuristics that maximize decomposition. In order to maximize the number 
of decompositions during search, the heuristic can be guided by the constraint 
graph. Such a heuristic may compute e.g. cut-points, bridges, or more powerful 
(minimal) cut-sets/separators [16]. Our framework is well prepared to accommo- 
date such complex strategies, in particular because our method already builds 
on access to the constraint graph. 

An open problem is the possible contradiction of heuristics aimed at de- 
composition versus fine-tuned problem specific heuristics. The heuristic aimed 
at decomposition may yield many partial problems that are not satisfiable and 
therefore lead to an inefficient search. On the other hand, the problem specific 
heuristic might yield no or only a few decompositions, which makes DDS un- 
profitable. Therefore, no general rule can be given. But our experiments suggest 
that a hybrid heuristic of selecting the variable with the highest node degree, 
and using the problem-specific heuristic as a tie breaker, yields good results (see 
Sec. 7). 

Order of exploration of partial problems. It is of high importance in which 
order the children of anrf-nodes, the partial problems, are explored. After detect- 
ing inconsistency in one partial problem, the exploration of the remaining partial 
problems is needless. Given an unsatisfiable problem, a good variable selection 
heuristic yields a failure during search as soon as possible to avoid unnecessary 
search (this is called the fail-first principle). Assuming that we already have such 
a good variable selection heuristic, we use it to guide the partial problem order- 
ing of DDS: (1) apply the selection to the whole (non-decomposed) variable set 
and (2) choose the partial problem first that contains the selected variable. That 
way, a good heuristic will lead to failure in the first explored partial problem if 
the decomposed problem has no solution. Further, it decreases the probability 
that the remaining partial problems are not satisfiable (in case the first is) . This 
mitigates the effect that failure may be detected late using DDS, mentioned in 
Subsec. 3.2. 

5 Global Constraints 

One of the key features of modern constraint solvers is the use of global con- 
straints to strengthen propagation. Therefore, a search algorithm has to support 
global constraints in order to be practically useful in such systems. We describe 
the problems global constraints pose for DDS, and how to tackle them. 
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For an n-ary (global) constraint, the constraint graph contains a hyperedge 
that connects all n variable nodes. Consider a CSP for AII-difFerent(it;, a;, y, 2;) 
with w,x £ {0, 1}, y^z £ {2,3}. In this model, we cannot decompose, although 
the binary constraints a ^ b for a £ {w, x}, b £ {y, z} are entailed! Thus, in the 
current set-up, the global All-different unnecessarily prevents decomposition. 

The solution to this problem is to take the internal structure of global con- 
straints into account: the global constraint itself can be decomposed into smaller 
constraints (on fewer variables). We will call this the constraint decomposition 
to distinguish it from constraint graph decomposition. 

If we reflect the constraint decomposition of global constraints in the con- 
straint graph, we recover all the decompositions that were prevented by the 
global constraints before. 

For our example involving the All-different constraint, the constraint graph 
would contain two constraints, w ^ x and y ^ z, instead of the one global 
All-different. The graph can now be decomposed. 

As global constraints often cover a significant portion of the variables in a 
problem, global constraint decomposition is an essential prerequisite to make a 
constraint graph decomposition by DDS possible. Note that global constraint 
decomposition is independent of the other constraints present in the constraint 
graph; typical permutation problems, for instance, feature one global All-different 
that forces all variables to form a permutation, and then several other constraints 
that determine the concrete properties of the permutation. Applying DDS to 
such a problem is useless unless the constraint decomposition of the All-different 
is considered when computing connected components. 

In general, a global constraint can be decomposed if and only if its extension 
(the set of allowed tuples) can be represented as a non-trivial product, i.e. a 
product of non-singieton sets. For the above example, the set of allowed tuples 
is {(0, 1), (1, 0)} X {(2, 3), (3, 2)}. If a constraint can be represented as such a 
non-trivial product axb, we can decompose it into two independent constraints, 
one with the tuples of a, the other with the tuples of b. 

Formally, defined in terms of CSP solutions, we say a constraint c G C of a 
CSP P — {X, V, C) is decomposable into the constraints ci, . . . , Cfc, if and only if 



5.1 Non-decomposable constraints 

Some global constraints are never decomposable during a constraint search, since 
they cannot be decomposed for any arbitrary domain P. 

For example, the tuples that satisfy a linear constraint, such as a linear 
equation or inequation, can never be represented as a non-trivial product. The 




(J vars{ci) and 
ie[i,fc] 

vars{ci) n vars{cj) = and 
sol{P') 



sol{P) 



with P' = 



{X,V,C') and C 



{ci,...,Cfe}UC\{c}. 
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reason for this is that each variable in a hnear constraint functionally depends on 
all other variables: for any two variables Xi and xj in the constraint X^iLi ~ 
c, picking a value for Xi determines Xj, when all other variables are assigned. 
Therefore, we cannot arbitrarily pick values from the domains 'D(xi) and 'Dlxj) 
such that the constraint is satisfied. 

Consequently, subgraphs covered by linear constraints stay connected until 
all its variables are assigned. When solving a problem where subsets of X are 
covered by non-decomposable constraints, we can guide the search heuristic for 
a decomposition into these subsets. On the other hand, in problems where one 
non-decomposable constraint covers the whole variable set, it is obvious that 
search cannot profit from DDS. 

5.2 Decomposable constraints 

To take full advantage of DDS, efficient algorithms for the detection of possi- 
ble constraint decompositions are necessary. Good candidates for an integration 
are the propagation methods that already investigate the variable domains. Un- 
fortunately, these propagators are highly constraint specific and thus no single 
general detection procedure for all decomposable constraints can be given. But 
there are global constraints whose propagation algorithms and data structures 
either directly yield the decomposition as a by-product, or that can be easily 
extended for detection. 

In the following, we discuss three important propagators that can detect and 
compute a decomposition efficiently. 

All-difFerent. In contrast to linear constraints, there is no functional depen- 
dency between variables for All-different, but the exact inverse - variables depend 
on the absence of values in a domain. This yields a maximal potential for decom- 
position. Rcgin's propagator for All-different [22] employs a variable-value graph, 
connecting each variable node with the value nodes corresponding to the current 
domain. We can observe that All-difFerent can be decomposed if and only if the 
variable-value graph contains more than one connected component. In Fig. 3 
the variable-value graph for the simple initial example is shown. As connected 
components of this graph have to be computed anyway during the propagation, 
we can get the constraint decomposition without any additional overhead. This 
technique generalizes to the global cardinality constraint. 

Slide. Introduced by Bessiere et al. [4], Slide slides a fc-ary constraint c over 
a sequence of variables, i.e. it holds if c{xi, . . . ,Xi+k-i) holds for all 1 < i < 
n — k + I. Slide can be split into two at variable Xd if the individual constraints 
involving Xd are entailed (see Fig. 4). Entailment happens at the latest when 
all variables between Xd-k+i and Xd+k-i are assigned. Depending on how soon 
the individual c are entailed, we can decompose even earlier. Slide establishes 
a dependency of a fixed width between variables, so that once that width is 
reached, the constraint can be decomposed. Note, this constraint decomposition 
is not complete and only detects certain non-trivial products along the variable 
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Fig. 3. Decomposable variable-value graph of an All-different constraint. 



ordering. Computing the full constraint decomposition would require insight into 
the structure of the constraint c. 



Regular. Pesant's regular language membership constraint [18] states that the 
values taken by a sequence of variables xi, . . . ,Xn belong to a regular language L. 
It is essentially a constraint represented in extension, as arbitrary tuples can be 
encoded into regular expressions. Propagation works on an unfolding of a finite 
automaton accepting L, called the layered graph (sec Fig. 5). 

If now, at some point during propagation, one layer is left with a single 
state (see Fig. 5), the graph can be split into two halves, making the singleton 
state a new final state (for the left half) and start state (for the right half). They 
correspond to regular expressions Ri and Rr, covering the two substrings left and 
right of that layer, such that the language generated by RiRr is a sublanguage of 
L that contains exactly those strings still licensed by the variable domains. Note 
that constraint decomposition is possible even without the variables Xi being 
assigned, but that it heavily depends on the actual automaton. Again, as for 
Slide, this only detects those non-trivial products that are compatible with the 
variable ordering of the regular constraint. Determining the full decomposition 
for Regular would amount to finding a non-trivial product representation of its 
allowed tuples, which cannot be computed efficiently. 




Fig. 4. Decomposing Slide at variable Xd- 
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Fig. 5. Layered graph for a Regular constraint. 

6 Implementation 

Our implementation of DDS extends Gecode, a C++ constraint programming li- 
brary. In this section, we give an overview of relevant technical details of Gecode, 
and discuss the four main additions to Gecode that enable DDS: access to the 
constraint graph, decomposing global constraints, integrating Decompose into 
the search heuristic, and specialized search engines. The additions to Gecode 
comprise only 2500 lines (5%) of C++code and enable the use of DDS in any 
CSP modeled in Gecode. DDS will be available as part of the next release of 
Gecode. 

6.1 Gecode 

The Gecode library [11] is an open source constraint solver implemented in C++. 
It lends itself to a prototype implementation of DDS because of four facts: 

1. Full source code enables changes to the available propagators. 

2. The reflection capabilities allow access to the constraint graph. 

3. Search is based on recomputation and copying, which significantly eases the 
implementation of specialized branchings and search engines. 

4. It provides good performance, so that benchmarks give meaningful results. 

6.2 Constraint graph 

In most CP systems, the constraint graph is implicit in the data structures for 
variables and propagators. Gecode, e.g., maintains a list of propagators, and 
each propagator has access to the variables it depends on. 

For DDS, a more explicit representation is needed that supports the com- 
putation of connected components. We can thus either maintain an additional, 
explicit constraint graph during propagation and search, or extract the graph 
from the implicit information each time we need it. For the prototype implemen- 
tation, we chose the latter approach. We make use of Gecode's reflection API, 
which allows to iterate over all propagators and their variables. Through reflec- 
tion, we construct a graph using data structures from the boost graph library [6], 
which also provides the algorithm that computes connected components. 
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Assigned variables are independent of all other variables as discussed in 
Sec. 4. Therefore, they are reported as individual partial problems (connected 
components) but are ignored to avoid useless trivial decompositions without any 
effect. Instead these already solved single variable CSPs are added to an arbi- 
trary partial problem that covers at least one unassigned variable. If the final 
number of such "non-solved" partial problems is at least two, a problem decom- 
position is initialised. This significantly speeds up the search process because 
only profitable decompositions are done. 

6.3 Global constraint decomposition 

As discussed in Sec. 5, it is absolutely essential for the success of DDS to consider 
constraint decompositions of global constraints when computing the connected 
components. 

There are two possible implementation strategies for decomposing global con- 
straints. A propagator can either detect decomposability during propagation and 
replace itself with several propagators on subsets of the variables. Or, alterna- 
tively, the constraint decompositions are only computed on demand when the 
constraint graph is required for connected component analysis. We implemented 
the latter option. 

This again leaves two possible implementations. When the constraint graph 
is decomposed, one propagator (for a global constraint) may belong to two con- 
nected components. When search continues in the individual components, we 
can either use the propagator as it is in both components, or replace it by its 
decomposition. The latter option has the advantage that the smaller propaga- 
tor may be more efficient (as it can ignore the variables outside its connected 
component). However, for simplicity, we implemented the former. 

6.4 Decomposing branchings 

Once we have identified connected components in the constraint graph, we have 
to create the partial problems that correspond to these components. In Gecode, 
we exploit the duality of choice and decomposition: both add branches to the 
search tree. The following observation leads to a simple and efficient implemen- 
tation. If the heuristic is restricted such that it only selects variables inside 
one connected component, also propagation will only occur for variables of that 
component: For X independent in P, propagate(P)|_;f; = propagate(i-|^). 

For our Gecode implementation. Decompose is thus realized as a branching. 
A branching in Gecode usually implements LeftChoice/RightChoice. For 
DDS, we extend it to also implement Decompose: If decomposition is possible, 
the branching limits further search to the variables in one connected component 
per branch. Otherwise, it just creates the usual choices according to the heuristic. 

Branchings in Gecode are fully programmable. They have to support two op- 
erations^: DESCRIPTION and commit, description returns an abstract descrip- 

^ In fact, branchings in Gecode have a slightly more complex interface, which we 
deviate from to simplify presentation. 



13 



tion of the possible branches while commit executes the branching according to 
a given description and alternative number. 

A decomposing branching in Gecode is a wrapper around a standard variable- 
value branching. The actual work is done by description: it requests the con- 
straint graph and performs the connected component analysis. If decomposition 
is possible, a special description is returned, representing the independent subsets 
Xi C X. Otherwise, description is delegated to the embedded variable-value 
branching. Note that Gecode supports 71-ary branchings, so decompositions do 
not have to be binary (as presented so far). 

When commit is invoked with a variable-value description, the call is again 
delegated to the embedded branching. For a decomposition description, the 
branching's list of variables is updated to Xi for branch i, those still active 
in the selected component. 

6.5 Decomposition search engines 

As decomposition is performed by the branching, the search engines have to be 
specialized accordingly. We developed four search engines for DDS. A counting 
search engine computes the number of solutions of a given problem. A general- 
purpose search engine allows to incrementally search the whole tree and access all 
the partial solutions. Based on that we provide a search engine that enumerates 
all full solutions. A graphical search engine based on Gecode's Gist (graphical 
interactive search tool) displays the search tree with special decomposition nodes, 
and allows to get an overview of where and how a particular problem can be 
decomposed. Figure 6 shows a screen shot of a partial search using DDS. Circular 
nodes with inner squares represent decompositions. All search engines accept 
cut-off parameters for the number of (full, not partial!) solutions to be explored. 

7 Applications and Empirical Results 

To illustrate possible use cases of DDS we applied it to two counting problems 
with global constraints. At first, the widely known graph coloring problem allows 
for a good and scalable illustration of the DDS effects due to the coherence 
of problem and constraint graph structure. It serves therefore as a model to 
investigate the impact of the problem structure on DDS in the presence of global 
constraints. Afterwards, we study the benefit of DDS on the real world problem 
of optimal protein structure prediction. This problem can be modelled using 
constraint programming [2] but necessitates the presence of global constraints 
covering the whole problem. Thus, the discussed constraint decomposition is an 
essential prerequisite to enable DDS. 

Both applications show tremendous reductions in runtime and search tree 
size. 

The applications were realized using our DDS implementation in Gecode. 
Only the search strategy was changed (DFS to DDS) - modeling, variable and 
value selection were kept the same for an appropriate comparison of the results. 
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We chose maximal degree with minimal domain size as tie breaker as dynamic 
variable selection, which enforces decomposition and works well for DFS, too. 



7.1 Graph coloring 

Graph coloring is an important and hard problem with applications in schedul- 
ing, assignment of radio frequencies, and computer optimization [17,23,26]. A 
proper coloring assigns different colors to adjacent nodes. We want the chro- 
matic polynomial for the chromatic number, i.e. the number of graph colorings 
with minimal colors. Graph coloring is a useful benchmark, because it gives us a 
scalable problem, so that we can apply DDS to instances of varying complexity. 

The constraint model. For a given undirected graph g and a number of colors 
c we introduce one variable per node with the initial domains 0..(c — 1). For 
each maximal clique of size > 2, we post an All-different constraint on the cor- 
responding variables. This maximizes the propagation necessary to solve these 
problems but still enables DDS as we discuss below. For all remaining edges we 
add binary inequality constraints. 

The test sets. We generated the two test sets GC-30 and GC-50 of graphs 
with 30 and 50 nodes. For each size, random graphs were obtained by inserting 
an edge of the complete graph with a fixed uniform edge probability P*^. This 
was done using the Erdos-Renyi random graph generator GTgraph [12]. For each 
edge probability P'^ from 16 to 40 percent, 2000 graphs were generated and their 
colorings counted via DFS and DDS. To test highly degenerated problems (with 
many solutions) as well, we stopped after 1 million solutions. 

Results. For the test sets, Tab. 1 compares the time consumption and search 
tree size by average ratios of DFS and DDS ( ^ ) ■ A figure of 100 thus means 
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that DDS is 100 times faster than DFS, or that the DFS search tree has 100 times 
as many nodes as the one for DDS. A dash means that most of the problems 
were not solved within a given time-out. 



DFS/DDS: Test set 


16 % 


18 % 


20 % 


22 % 


rPl RT- ^^"30 
GC-50 


411.2 
242.7 


197.7 
151.8 


75.74 
34.23 


34.6 
16.5 


Qrp ■ GC-30 
bi size; 


680.3 
646.1 


344.4 
383.8 


142.0 
94.28 


74.48 
47.26 



DFS/DDS: Test set 


24 % 


28 % 


32 % 


40 % 


rPl RT- ^^"30 
GC-50 


23.1 
18.2 


11.9 
3.4 


3.85 
2.71 


2.14 


. GC-30 
ST size: ^^ ^.^ 


62.27 
41.69 


33.96 
11.6 


10.90 
9.28 


4.97 



Table 1. Average ratios of DFS vs. DDS for various edge probabilities (RT — runtime, 
ST = search tree) 



The presented runtime ratios show the high speedup for graphs with edge 
probabihties P"^ < 40%. The distribution of speedup is exemplified in Fig. 7. 
The speedup corresponds to an even larger reduction of the search tree for 
DDS, which was only increased for 0.5% of all problems. Furthermore, sparse 
graphs yield a much higher runtime improvement than dense graphs, visualized 
by Fig. 8. The number of fails and propagations show no significant effect of 
DDS in contrast to runtime or search tree size. 




Fig. 7. Histogram of logarithmic speedup for — 28 and 30 nodes (the dashed line 
marks equal runtime). 
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Fig. 8. Avg. speedup decrease from 400 to 2 by graph density. 



Still, the search tree reduction is not completely reflected in runtime speedup, 
which illustrates the computational overhead of DDS in the current prototypic 
implementation. Anyway, our data shows that DDS is well suited to improve so- 
lution counting even for dense graphs with P"^ about 40%. We expect even higher 
speedups and search-tree reductions if the solutions are counted completely, i.e. 
without the current upper bound of 1 million. Table 1 suggests that the speedup 
decreases with increasing number of nodes to color in the graph. With increasing 
number of nodes, the graph as well as the constraint graph grow quadratically. 

The speedup is significantly lower than the reduction of the search tree. In 
part this can be ascribed to our implementation that rebuilds the constraint 
graph in each search step. A system that provides cheaper constraint graph 
access, e.g. by maintaining it incrementally, is expected to perform much better. 

7.2 Optimal protein structure prediction 

The prediction of optimal (minimal energy) structures of simplified lattice pro- 
teins is a hard (NP-complete) problem in bioinformatics. Here we focus on the 
HP-model introduced by Lau and Dill [13]. In this model, a protein chain is 
reduced to a sequence of monomers of equal size, whereby the 20 aminoacids 
are grouped into hydrophobic (H) or polar (P). A structure is a self-avoiding 
walk of the underlying lattice (e.g. square or cubic). A contact energy function 
is used to determine the energy of a structure. The energy table and an example 
is given in Fig. 9. The problem is to predict minimal energy structures for a 
given HP-sequence. 

The number and quality of optimal structures has applications in the study 
of energy landscape properties, protein evolution and kinetics [9,20,25]. 

The constraint model. In [2], the problem was successfully modeled as CSP 
and named Constraint-based Protein Structure Prediction (CPSP). Here, a vari- 
able is introduced for each sequence position and with lattice points as do- 
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H-1 
a) P 



H P 





Fig. 9. (a) Contact energy function (b) Example HP-structure with energy -2 (H- 
monomer: black, P-monomer: white, structure back bone: grey, HH-contact: dotted) 
(c) Compression of the solution space from 9 complete down to 3 by 3 partial structures. 



mains^. The self-avoiding walk is modeled by a sequence of binary neighboring 
constraints (ensuring the connectivity of successive monomers) and a global All- 
different constraint for self-avoidingness. Supporting decomposition of the All- 
different propagator, see Sec. 5, is therefore essential for profiting from DDS. 

CPSP uses a database of pre-calculated point sets, called H-cores, that rep- 
resent possible optimal distributions of H-monomers. By that, the optimization 
problem is reduced to a satisfaction problem for a given H-core, if H-variables 
are restricted to these positions. For optimal H-cores, the solutions of the CSP 
are optimal structures. Thus, for counting all optimal structures, one iterates 
through the optimal cores. 

The test sets. We generated two test sets, PS-48 and PS-64, with uniformly 
distributed random HP-sequences of length 48 and 64. For the generation we used 
the free available CPSP implementation [7]. With only minimal modifications 
(new branching) we use the existing CSP model with DDS. 

PS-48 contains 6350 HP-sequences and for each up to 1 million optimal struc- 
tures in the cubic lattice were predicted. For the 2630 HP-sequences in PS-64 
up to 2 million structures have been predicted in the cubic lattice, due to the 
increasing degeneracy in sequence length. 

Results. The average ratio results are given in Tab. 2. There, the enormous 
search tree reduction with an average factor of 11 and 25 respectively is shown. 
The reduction using DDS compared to DFS leads to much less propagations (3- 
to 5-fold). This and the slightly less fails result in a runtime speedup of 3-/4- fold 
using the same variable selection heuristics for both search strategies. Here, the 
immense possibilities of DDS even without advanced constraint-graph specific 
heuristics are demonstrated. This also shows the rising advantage of DDS over 
DFS for increasing problem sizes (with higher solution numbers). 



* In practice, lattice positions are indexed by integers such that standard constraint 
solvers for finite domains over integers are applicable. 
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DFS / DDS 





runtime 


ST size 


fails 


propagations 


PS-48 


2.98 


11.30 


1.40 


3.27 


PS-64 


4.23 


25.33 


1.76 


5.43 



Table 2. Average ratios for CPSP using DFS vs. DDS (ST = search tree) 



8 Discussion 

The paper introduces decomposition during search (DDS), an integration of 
And/Or seareh with propagation-based constraint solvers. DDS dynamicaUy de- 
composes CSPs, avoiding much of the redundant work of standard tree search 
when exploring huge search spaces, e.g. of #P-hard counting problems. 

We discuss the interaction of DDS with such vital and essential features as 
global constraints and dynamic variable ordering. The techniques presented here 
have been implemented for Gecode. 

The empirical evaluation on graph coloring and protein structure prediction 
shows the huge potential of DDS in terms of search tree size reduction and 
already high true runtime speedup. The speedup proves that DDS can be imple- 
mented competitively, and with a reasonable overhead. We expect even higher 
speedups by improving the constraint graph representation and its incremental 
maintenance, which is a current area of development. However, one experience 
from our experiments is that it is highly problem-specific whether the constraint 
graph allows for decomposition. We partly explain this by pointing out that 
some constraints (e.g. linear (in-) equations) inherently hinder decomposition. 

We envision promising future research in the following directions. First, pro- 
viding efficient access to the constraint graph. Second, the development of specif- 
ically tailored heuristics for DDS focusing on dynamic variable selection or do- 
main splitting. Such heuristics should employ information about the constraint 
graph, to decompose the problem as often as possible and in a well-balanced 
way. Decomposition-directed heuristics might however counteract problem spe- 
cific heuristics. Balancing such heuristics is a further research direction. 

Finally, solving optimization problems using And/Or branch-and-bound (BAB) 
search [15] seems an obvious extension. However, our first experiments using a 
prototypical DDS extension of BAB show much smaller benefits than for count- 
ing (similar to the results in [15, 16]). 
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